Linearity: The relationship between x and y is linear
Independence: Observations are independent
Assumptions
Linearity: The relationship between x and y is linear
Independence: Observations are independent
Homoscedasticity: Constant variance of residuals
Assumptions
Linearity: The relationship between x and y is linear
Independence: Observations are independent
Homoscedasticity: Constant variance of residuals
Normality: Residuals are normally distributed
Example: Association between age and plasma concentration
Example: Association between age and plasma concentration
Example: Association between age and plasma concentration
m0 <-lm(conc ~ age, data = df)summary(m0)
Call:
lm(formula = conc ~ age, data = df)
Residuals:
Min 1Q Median 3Q Max
-42.492 -12.643 -0.968 9.998 50.981
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 126.7136 10.0424 12.618 3.68e-15 ***
age 1.3124 0.2619 5.011 1.28e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 19.55 on 38 degrees of freedom
Multiple R-squared: 0.3979, Adjusted R-squared: 0.3821
F-statistic: 25.11 on 1 and 38 DF, p-value: 1.282e-05
Example: Association between age and plasma concentration
Are the assumptions fulfilled?
Example: Association between age and plasma concentration
Add group as a variable.
m1 <-lm(conc ~ age + group, data = df)summary(m1)
Call:
lm(formula = conc ~ age + group, data = df)
Residuals:
Min 1Q Median 3Q Max
-16.849 -8.185 0.786 6.618 17.656
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 303.4541 15.9759 18.994 < 2e-16 ***
age -1.9022 0.3042 -6.254 3.58e-07 ***
group2 -110.9344 9.7480 -11.380 2.59e-13 ***
group3 -82.1424 7.4540 -11.020 6.29e-13 ***
group4 -44.7618 5.3530 -8.362 7.33e-10 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 9.24 on 35 degrees of freedom
Multiple R-squared: 0.8762, Adjusted R-squared: 0.862
F-statistic: 61.92 on 4 and 35 DF, p-value: 2.173e-15
Linear mixed models
Linear regression assumes all observations are independent, this assumption is violated when observations are grouped.
Ignoring the grouping structure can lead to:
Misleading conclusions
Incorrect standard errors
Inflated type I error rates
A linear mixed model accounts for the grouping structure by including random effects.
Grouping structures
Grouped observations are very common in the life sciences;
Repeated measures Multiple measurements from the same subject.
Longitudinal Studies Measure of the same subject measures at multple time points.
Nested Designs Measures of mice within cages within labs.
Multi-omics Studies Omics data (genomics, transcriptomics, proteomics, etc.) from the same individual
Experimental Designs Technincal repeats (same sample). Measurements from different batches, labs, regions etc.
Mixed effects models
Fixed effects
Population level effects
Estimated explicitly
Random effects
Account for variation between groups (subjects, batches, etc.)
Assumes there is a distribution of effect sizes across groups.
Group effects are not estimated individually, instead the variance is estimated
Grouping structure
The grouping structure can be cohorts, batches, subjects, schools, hospital, doctor, cage, lab etc.
Should be determined from the study design and not inferred from the data.
Note: The grouping is always categorical.
Model with fixed and random effects
Once the grouping is decided, we can decide what effects are fixed and what effects are random.
The effects can be;
Intercept
Slope
Interaction
Model with fixed and random effects
Example: Concentration and age
Model the association between plasma concentration and age.
The grouping structure here could be e.g. different clinics.
Model as fixed or random effect
We believe that;
the effect of age on concentration is the same across clinics, hence fixed effect
the intercept (baseline concentration) varies between clinics
Option for intercept
Include group as a fixed effect to explicitly estimate each group’s baseline. This consumes more degrees of freedom, but would allow us to compare clinics.
Model intercept as a random effect across groups. Estimate variance over groups, but not the intercept for each group. This is more parsimonious and allows us to focus on the overall effect of age without estimating each clinic’s baseline concentration.
\(\beta_0, \beta_1\): fixed effects (intercept and slope)
\(b_{0i}\): random intercept per group
\(b_{0i} \sim N(0, \sigma_b)\), where \(\sigma_b\) is the standard deviation of the random intercept
\(\varepsilon_{ij}\): residual error
Model with random intercept
Mixed model in R
library(lme4)mm <-lmer(conc ~ age + (1| group), data = df)
age: Fixed effect
(1 | group): Random intercept for group
Mixed model in R
library(lme4)mm <-lmer(conc ~ age + (1| group), data = df)summary(mm)
Linear mixed model fit by REML ['lmerMod']
Formula: conc ~ age + (1 | group)
Data: df
REML criterion at convergence: 304.1
Scaled residuals:
Min 1Q Median 3Q Max
-1.75481 -0.89181 0.09125 0.70848 1.85339
Random effects:
Groups Name Variance Std.Dev.
group (Intercept) 2224.44 47.164
Residual 85.47 9.245
Number of obs: 40, groups: group, 4
Fixed effects:
Estimate Std. Error t value
(Intercept) 241.3334 26.0615 9.260
age -1.8293 0.3014 -6.068
Correlation of Fixed Effects:
(Intr)
age -0.422
Random slope
If you believe that the slope actually varies between groups, include random slope for age in addition to random intercept.
Random slope in R
mm2 <-lmer(conc ~ age + (1+ age | group), data = df)summary(mm2)
Linear mixed model fit by REML ['lmerMod']
Formula: conc ~ age + (1 + age | group)
Data: df
REML criterion at convergence: 302.9
Scaled residuals:
Min 1Q Median 3Q Max
-1.5822 -0.8157 0.2107 0.7805 1.6553
Random effects:
Groups Name Variance Std.Dev. Corr
group (Intercept) 4113.8342 64.1392
age 0.3925 0.6265 -0.67
Residual 77.1173 8.7816
Number of obs: 40, groups: group, 4
Fixed effects:
Estimate Std. Error t value
(Intercept) 249.2426 34.0378 7.323
age -1.9862 0.4373 -4.542
Correlation of Fixed Effects:
(Intr)
age -0.682
Predicting random effects
The random effects can be predicted using the ranef() function.
(Intercept) age
241.333393 -1.829255
$group
(Intercept)
1 58.13116
2 -50.26973
3 -22.21498
4 14.35354
with conditional variances for "group"
The overall mean is given by the fixed effects, and the random effects are the deviations from this mean for each group.